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GROUP SIZE AND TREATMENT INTENSITY 


Abstract 
Group size and treatment intensity are understudied topics in mathematics intervention research. 
This study examined whether the treatment intensity and overall intervention effects of an 
empirically-validated Tier 2 mathematics intervention varied between intervention groups with 
2:1 and 5:1 student-teacher ratios. Student practice opportunities and the quality of explicit 
instruction served as treatment intensity metrics. A total of 465 kindergarten students with 
mathematics difficulties from 136 intervention groups participated. Results suggested 
comparable performances between the 2:1 and 5:1 intervention groups on six outcome measures. 
Observation data indicated that the intensity of student practice opportunities differed by group 
size. Students in the 5:1 groups received more opportunities to practice with their peers, while 
students in the 2:1 groups participated in more frequent and higher quality individualized 
practice opportunities. Implications in terms of delivering Tier 2 interventions in small-group 


formats and engaging at-risk learners in meaningful practice opportunities are discussed. 


Keywords: treatment intensity, group size, student practice opportunities, explicit 
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GROUP SIZE AND TREATMENT INTENSITY 3 


Examining the Impact of Group Size on the Treatment Intensity of a 
Tier 2 Mathematics Intervention 

Within multi-tiered approaches to mathematics instruction and Response to Intervention 
(RtI) frameworks (Fuchs & Vaughn, 2012), treatment intensity is generally conceptualized as an 
alterable variable that can be purposefully manipulated to maximize student learning and obtain 
an optimal level of instruction (Warren, Fey & Yoder, 2007). For example, in a three-tiered 
model, if a student does not adequately respond to Tier 1 mathematics instruction, the goal of 
Tier 2 is to provide a more systematic, intensive experience. A similar increase in intensity is 
conceptualized in moving from Tier 2 to Tier 3. 

Instructional time is often recognized as a variable of treatment intensity and recently 
researchers have focused on ways to intensify mathematics instruction by increasing factors of 
time, such as the amount of time spent in each session and the number of days taught per week. 
Bryant et al. (2011) systematically increased the amount of instructional time across a series of 
mathematics intervention studies. In their most recent study, Bryant and colleagues (2011) found 
that increased intervention time was a decisive factor in improving the mathematics achievement 
of students with mathematics difficulties (MD). 

More recently, Codding et al. (2016) investigated variations of treatment dosage of a 
small-group intervention focused on whole number operations. A total of 101 2nd, 3rd, and 4th 
grade students with MD were randomly assigned to one of the three dosage conditions (sessions 
once, twice or four times per week) or a control condition. Findings from a proximal outcome 
measure suggested that students taught in the higher dosage small groups (i.e., four sessions per 


week) outperformed their peers in the control and other two dosage conditions. 
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Existing frameworks of treatment intensity also consider group size or the instructional 
format in which interventions are delivered as an effective way to increase treatment intensity 
(e.g., Codding & Lane, 2015). Group size is an instructional variable that is backed by 
substantial empirical evidence, particularly in the area of early literacy (Elbaum, Vaughn, 
Moody, Hughes, & Moody, 2000). Although the use of small-group instruction in mathematics 
does not have the same level of empirical support as in reading, theoretically this instructional 
format offers an effective means for intensifying mathematics instruction for students with MD. 
Yet, few studies have examined the effect of group size on the treatment intensity of early 
mathematics interventions. 

Author et al. (2017) conducted a recent randomized controlled trial in which they 
experimentally investigated the impact of group size on the treatment intensity and student 
mathematics outcomes in the context of ROOTS, an evidence-based Tier 2 kindergarten 
mathematics intervention (Author et al., 2015; Author et al., 2016). Participating in the study 
were approximately 600 students from 60 kindergarten classrooms in Oregon. These 
kindergarten students represented the first two cohorts of the larger, federally-funded ROOTS 
Efficacy Project. Aligned with other Tier 2 mathematics interventions (Bryant et al., 2011), the 
ROOTS program is delivered in small-group formats and is designed to promote number sense 
development among students with MD. To build students’ conceptual understanding of and 
procedural fluency with whole numbers and operations, the 50-lesson ROOTS program centers 
on a systematic and explicit instructional design framework (Archer & Hughes, 2011; Gersten et 
al., 2009). In this way, the intervention engages students in purposefully planned and explicitly 
delivered mathematics tasks and activities. 


To examine the effect of group size on the intervention impact and treatment intensity of 
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ROOTS, Author et al. (2017) focused on two treatment conditions. In one condition, students 
were randomly assigned to receive the ROOTS intervention in groups with 2:1 student-teacher 
ratios (2:1 ROOTS groups), whereas the other condition provided ROOTS in groups with 5:1 
student-teacher ratios (5:1 ROOTS groups). Random assignment resulted in 60 and 59 ROOTS 
2:1 and 5:1 intervention groups, respectively. Students in both treatment conditions continued to 
receive core mathematics instruction. 

Since the ROOTS intervention was purposefully designed to deeply engage students in 
foundational whole number concepts and skills, Author et al. (2017) used the frequency of 
student practice opportunities as a metric of treatment intensity. The research suggests that 
frequent practice opportunities are critical for fostering mathematics proficiency among students 
with and without MD (Clements, Agodini, & Harris, 2013; Gersten et al., 2009; Author et al., 
2015). Similar to other explicit mathematics interventions (Bryant et al., 2011; Sood & Jitendra, 
2013), the ROOTS intervention offers students with guided practice opportunities to promote a 
high success rate with the targeted mathematical content. In ROOTS, such practice opportunities 
consist of individual students or the group at large working with concrete representations of 
mathematical ideas and engaging in mathematics verbalizations. Group response opportunities 
allow all students to practice in unison, whereas individualized practice permits an opportunity 
for one student to convey or demonstrate her mathematical thinking, understanding, and 
reasoning. 

While the ROOTS intervention was designed to provide intensive learning experiences 
regardless of group size, Author et al. (2017) hypothesized that the 2:1 ROOTS groups would 
demonstrate stronger treatment effects and receive more opportunities to practice than the 5:1 


ROOTS groups based on the lower student to teacher ratio. Author et al. (2017) reported non- 
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significant differences in student mathematics outcomes when comparing the 2:1 and 5:1 
ROOTS groups. Findings suggested that the impact of ROOTS was essentially the same 
regardless of whether students participated in the 2:1 or 5:1 groups. Results for their second 
hypothesis, however, indicated that the frequency of learning or practice opportunities students 
received differed by group size. Whereas students in the 2:1 ROOTS groups received more 
opportunities to practice on their own, students in 5:1 ROOTS groups participated in more group 
level practice. 

In sum, because the study conducted by Author et al. (2017) was part of the larger 
ROOTS Efficacy Project, it represented the initial investigation of group size and treatment 
intensity of the ROOTS intervention. Confirming and identifying the generalizability of its 
reported findings within a planned sequence of replication (Coyne, Cook, & Therrien, 2016), 
therefore, was considered crucial to the broader contributions of our program of research. As 
such, continued research on the treatment intensity of the ROOTS intervention involving 
kindergarten students from other geographical regions was deemed warranted. 

Purpose of the Study 

The purpose of this randomized controlled trial (RCT) was to extend the existing 
literature on Tier 2 mathematics interventions by investigating the extent to which the purposeful 
manipulation of group size affected the overall intervention impact and treatment intensity of the 
ROOTS intervention. Participating in the RCT were 72 kindergarten classrooms from 2 school 
districts in the metropolitan area of Boston, MA. In conducting the current study in Boston, we 
sought to determine whether the results of Author et al. (2017) generalized across instructional 
settings and participants from a different geographical region of the U.S. Accumulating 


converging evidence through a framework of systematic replication, as noted by Coyne et al. 
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(2016), increases the trustworthiness of information about an intervention or approach, such as 
selecting the appropriate group size to deliver mathematics interventions for students with MD. 
Therefore, extending the research of Author et al. (2017), the current study assessed whether the 
overall intervention impact and treatment intensity, as measured by the frequency of student 
practice opportunities, of the ROOTS intervention varied across small group formats with 2:1 
and 5:1 student-teacher ratios. Additionally, to expand our work on treatment intensity, we also 
investigated whether the quality of explicit instruction delivered during the ROOTS intervention 
varied by group size. 

Three research questions were investigated: (1) Does the effect of the ROOTS 
intervention on student mathematics achievement vary by group size (2:1 ROOTS group vs. 5:1 
ROOTS group)? (2) Does the frequency of student practice opportunities facilitated during 
ROOTS instruction vary by group size (2:1 ROOTS group vs. 5:1 ROOTS group)? (3) Does the 
quality of explicit instruction during the ROOTS intervention vary by group size (2:1 ROOTS 
group vs. 5:1 ROOTS group)? 

Method 

This study employed a randomized block design (blocking on classrooms), randomly 
assigning students within classrooms to one of three conditions: (2:1 ROOTS group, 5:1 ROOTS 
group, and a no-treatment control condition). Because a separate line of research has 
demonstrated the efficacy of the ROOTS intervention relative to a core mathematics program 
(Author et al., 2015a) and no-treatment control conditions (Author et al., 2016a; Author et al., 
2016b), the primary focus of the current study was a comparison between the 2:1 and 5:1 
ROOTS groups. Thus, the current analyses did not include students in the control condition. The 


study was conducted across two years, with Year | and Year 2 representing the 2014-2015 and 
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2015-2016 school years, respectively. Each study year involved a different cohort of 
kindergarten students. These cohorts represented the final two cohorts of the larger, federally- 
funded ROOTS Efficacy Project. 

Participants 

Schools. Nine elementary schools from two Boston, MA area school districts participated 
in both Year | and Year 2 of the present study. District A had a total enrollment of 6,118 
students in Year | and 6,350 students in Year 2. All kindergarten students in District A attended 
the same school. District B had a total enrollment of 6,834 students in Year 1 and 6,721 students 
in Year 2. Eight separate schools from District B participated in the study. 

Classrooms. Participants were drawn from 36 classrooms in Year | and 36 classrooms in 
Year 2 (N = 72). Of the 36 Year 1 classrooms, 28 participated in Year 2. Thus, a total of 44 
distinct kindergarten classrooms participated across the study. Half-day kindergarten programs 
were offered in 11 and 7 classrooms in Year 1 and Year 2, respectively. Across both years of the 
study, the average classroom size was 24.9 students (SD = 5.9) 

Kindergarten classrooms were taught by 44 certified kindergarten teachers, of whom 39 
provided the following demographic information: 100% of teachers identified as female and 92% 
as White. Teachers had an average of 14.0 years of teaching experience and 8.9 years of 
kindergarten teaching experience. Of the 44 teachers, 72% of teachers had a master’s degree in 
education, and 51% of teachers had completed an algebra course at the college level. 

Criteria for participation. In each participating classroom, all students with parental 
consent were screened in the late fall of their kindergarten year. The screening process included 
the Assessing Student Proficiency in Early Number Sense (ASPENS; Author et al., 2011) and 


the Number Sense Brief (NSB; Jordan, Glutting, & Ramineni, 2010), which are standardized 
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measures of early mathematics proficiency. Students were eligible for the ROOTS intervention 
and thus considered at risk for MD if they received an NSB score of 20 or less and an ASPENS’ 
composite score in the strategic or intensive ranges. 

Once students were determined eligible for the ROOTS intervention, the project’s 
independent evaluator separately converted students’ NSB and ASPENS scores into standard 
scores and then combined the two standard scores to form an overall composite score for each at- 
risk student. Composite scores within each classroom were then rank ordered, and the 10 lowest 
ROOTS-eligible students were randomly assigned to one of three conditions: (a) 2:1 ROOTS 
groups, (b) 5:1 ROOTS groups, or (c) a no-treatment control condition. As previously noted, the 
current analyses included only those students randomly assigned to the two ROOTS conditions. 

Out of the 36 participating classrooms in each year of the study, 26 had at least 10 
students who met the eligibility criteria. However, 10 classrooms in Year 1 and 10 classrooms in 
Year 2 had fewer than 10 ROOTS-eligible students. In these instances, we combined at-risk 
students from these classrooms to meet the random assignment’s 10-student requirement. For 
example, in Year 1, at-risk students from two classrooms were combined to form a “virtual” 
ROOTS classroom, which provided a 2:1 ROOTS group and a 5:1 ROOTS group. After these 
cross-class grouping procedures were applied, a total of 136 ROOTS groups were formed (n = 69 
2:1 ROOTS groups, n = 67 5:1 ROOTS group). 

Students. A total of 1,580 kindergarten students were screened for ROOTS eligibility. Of 
these students, 659 met eligibility criteria and were randomly assigned to the two-student group 
condition (n = 138) the five-student group condition (n = 327), or the no-treatment control 


condition (n = 194). See Table 1 for demographic information on the ROOTS students. 
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Interventionists. ROOTS intervention groups were taught by district-employed 
instructional assistants (75%) and by interventionists hired specifically for this study (25%). All 
interventionists were female, with 61% identifying as White and 20% as Hispanic. Most 
interventionists had prior experience with small group instruction (91%), a bachelor’s degree or 
higher (63%), and an average of 13 years of teaching experience. Half of the interventionists 
(50%) had taken college level algebra course and 24% had a current teaching license. 
Procedures 

ROOTS Intervention. ROOTS is a Tier 2 kindergarten mathematics intervention 
program that consists of 50 lessons delivered in small group formats. The primary aim of 
ROOTS is to support kindergarten students with MD in developing a robust understanding of 
whole number concepts and skills. Specifically, the ROOTS intervention prioritizes concepts 
from the Counting and Cardinality and Operations and Algebraic Thinking domains of the 
Common Core State Standards for mathematics (2010). ROOTS promotes students’ mathematics 
proficiency by judiciously including essential features of explicit mathematics instruction, such 
as teacher modeling, student practice opportunities, and teacher-provided academic feedback. Of 
particular relevance to the current study’s investigation of treatment intensity is the frequency 
and quality of student practice opportunities prioritized by the ROOTS intervention. Such 
practice opportunities include students verbalizing their mathematical thinking and 
understanding, and working with visual representations of mathematical ideas (e.g., base-ten 
blocks). 

In the current study, ROOTS was delivered in small-group formats (2:1 or 5:1 student- 
teacher ratios). Interventionists delivered the 20-minute lessons, five days per week for 


approximately 10 weeks. Onset of ROOTS began in late fall and ended in the spring. Because 
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ROOTS is a supplemental intervention it was delivered in addition to students’ core mathematics 
instruction. 

Professional Development. All participating interventionists received two five-hour 
professional development workshops. The first workshop focused on Lessons 1-25, while the 
second workshop targeted Lessons 26-50. Both workshops, which were delivered by project 
staff, also gave interventionists exposure to empirically validated practices of mathematics 
instruction and small group management techniques. During the workshops, interventionists 
received opportunities to practice and receive feedback on lesson delivery. To bolster 
implementation, all interventionists received in-class coaching support during the intervention. 
Coaching visits offered feedback on the fidelity and quality of intervention implementation. Each 
intervention group received two coaching visits over the course of the study. 

Fidelity of Implementation. In order to determine the extent to which the ROOTS 
intervention was delivered as intended, fidelity of ROOTS implementation was directly 
observed. Observers used a four-point scale (4 = all, 3 = most, 2 = some, 1 = none) to rate the 
extent to which interventionists met the lesson’s instructional objectives, followed the provided 
teacher scripting, and used the prescribed math models for that lesson. Observers also recorded 
the number of prescribed activities delivered during the lesson. Overall, observations indicated 
that instruction in the 2:1 and 5:1 ROOTS groups was delivered with similar levels of 
implementation fidelity. As shown in Table 3, no significant differences in fidelity of 
implementation were observed between the 2:1 and 5:1 ROOTS groups (p’s > .18). 

Core Mathematics Instruction. Throughout the study, ROOTS students continued to 
receive core mathematics instruction delivered in their kindergarten classroom. Survey data 


reflected that teachers in District A primarily used the Scott Foresman mathematics curriculum 
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during core mathematics instruction, while teachers in District B primarily used the 
enVisionMath curriculum. Teachers also supplemented core instruction with their own materials. 
Teachers reported that they provided an average of 32.8 minutes of mathematics instruction per 
day (SD = 22.8). Teachers also noted that a main instructional focus when teaching whole 
number concepts was reading number names and knowing the count sequence. 

Measures 

All ROOTS students were administered five measures of whole number understanding at 
pretest and posttest. One distal measure of mathematics achievement was administered at 
posttest only. Trained research staff administered all student measures. Inter-scorer reliability 
criteria were met for all assessments (i.e., >.95). 

ROOTS Assessment of Early Numeracy Skills (RAENS; Author et al., 2012b) is a 
researcher-developed, individually administered measure that consists of 32 items. Items assess 
aspects of counting and cardinality, number operations, and the base-10 system. In an untimed 
setting, students are asked to count and compare groups of objects, write, order, and compare 
numbers, label visual models (e.g. ten-frames), and write and solve single digit addition 
expressions and equations. RAENS’ predictive validity ranges from .68 to .83 for the TEMA-3 
and the NSB. Inter-rater scoring agreement is reported at 100% (Author et al., 2016b). 

Oral Counting — Early Numeracy Curriculum-Based Measurement (Author & 
Author, 2004). This curriculum-based measure has students orally count in English for one 
minute and the discontinue rule applies after the first counting error. The highest correct number 
counted represents a student’s score. Test-retest reliability and alternate-form reliability are 
reported at above .80, concurrent validity is reported as ranging from .49 to .70, and predictive 


validity with standardized measures of mathematics ranging from .46 to .72. 
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Assessing Student Proficiency in Early Number Sense (ASPENS: Author et al., 
2012a) is a set of three curriculum-based measures validated for screening and progress 
monitoring in kindergarten mathematics. Each 1-minute fluency-based measure assesses an 
important aspect of early numeracy proficiency, including number identification, magnitude 
comparison, and missing number. Test-retest reliabilities of kindergarten ASPENS measures are 
in the moderate to high range (.74 to .85). Predictive validity of fall scores on the kindergarten 
ASPENS measures with spring scores on the TerraNova 3 is reported as ranging from .45 to .52. 

Number Sense Brief Screener (NSB; Jordan et al., 2008) is an individually administered 
measure with 33 items that assess counting knowledge and principles, number recognition, 
number comparisons, nonverbal calculation, story problems and number combinations. NSB has 
a coefficient alpha of .84. 

Test of Early Mathematics Ability — Third Edition (TEMA-3; Ginsburg & Baroody, 
2003) is a standardized, norm-referenced, individually administered measure of beginning 
mathematical ability. The TEMA-3 assesses whole number understanding for children ranging in 
age from 3 to 8 years 11 months. Alternate-form and test-retest reliabilities of the TEMA-3 are 
.97 and .93, respectively. The TEMA-3 has concurrent validity with other mathematics measures 
ranging from .54 to .91. 

Stanford Early School Achievement Test (SESAT; Harcourt Brace & Company, 2003) 
is a group administered, standardized, norm referenced measures, with two mathematics subtests: 
Problem Solving and Procedures. The internal consistency for the SESAT is .88. ROOTS 
students were administered the SESAT at posttest only. 


Observations of ROOTS Instruction 
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Each ROOTS group was observed approximately three times (M = 2.9, SD = 0.8) over 
the course of the intervention, with approximately three weeks separating each observation 
occasion. A total of 391 observations were conducted, of which 124 included two observers. 
Trained observers, who were blind to our research hypotheses, conducted all observations using 
two observation measures. 

Classroom Observations of Student-Teacher Interactions—Mathematics (COSTI-M; 
Author et al., 2015). Observers used the COSTI-M to document four types of student practice 
opportunities in the 2:1 and 5:1 ROOTS groups. These practice opportunities, which served as 
metrics of treatment intensity, included: (a) individual practice, (b) group practice, (c) guided 
practice, and (d) independent practice. While the COSTI-M also documents teachers’ use of 
explicit demonstrations and provision of academic feedback, the current study focused 
specifically on student practice opportunities because prior research with the COSTI-M suggests 
their association with student mathematics achievement (Author et al., 2015a). Individual 
practice represented a practice opportunity provided to one student, while group practice 
represented a practice opportunity provided to two or more students. Individual and group 
practice included student mathematics verbalizations and opportunities to manipulate concrete 
representations of mathematical ideas (e.g., base-ten blocks). Observers also coded whether 
individual and group practice entailed concurrent teacher support (guided or independent). 
Guided practice was operationally defined as an opportunity for one or more students to 
verbalize or physically demonstrate their mathematical understanding with concurrent 
instructional support from the teacher. Independent practice represented an opportunity for one 
or more students to verbalize or physically demonstrate their mathematical understanding 


without teacher support. Rates of these practice opportunities were calculated by dividing their 
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observed frequency by the number of instructional minutes. Analyses also included an “all 
practice” variable, which comprised guided, independent, group, and individual practice 
opportunities. Mean rates of practice opportunities across observation occasions were calculated 
and used as treatment intensity predictors in subsequent analyses. 

Quality of Explicit Mathematics Instruction (QEMI; Author et al., 2012) is a broad 
measure of instruction quality. The QEMI comprises seven items that target the quality of 
explicit mathematics instruction, including group and individual practice opportunities, student 
participation, teacher modeling, academic feedback, efficiency of instructional delivery, and 
instructional scaffolding. Internal consistency of the measure was high, .93 (coefficient alpha). 
To rate the quality of each item, observers used a 4-point rating scale, with scores of 1—2 
representing the lower quality range and 3-4 representing the upper quality range. Observers 
completed the QEMI at the conclusion of each observation occasion. Total QEMI scores were 
computed as the mean across all items. The mean across the three observations for the 2:1 and 
5:1 ROOTS groups was used as a treatment intensity predictor in subsequent analyses. 

Estimates of Inter-Observer Reliability and Stability. Inter-observer reliability for the 
COSTI-M variables, which were represented by intra-class correlation coefficients (ICCs), were 
as follows: .89 for all practice, .92 for individual practice, .91 for group practice, .41 for guided 
practice, and .79 for independent practice. ICCs for the QEMI’s total score (i.e., overall quality 
of explicit mathematics instruction) and the ROOTS fidelity of implementation tool were .93 and 
.88, respectively. Guidelines proposed by Landis and Koch (1977) suggest that these ICCs 
indicate moderate to nearly perfect inter-observer agreement. 

To provide an estimate of stability, ICCs were calculated across the three observations 


within each ROOTS group. Stability ICCs for COSTI-M variables were as follows: .28 for all 
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practice, .23 for individual practice, .20 for group practice, .27 for guided practice, and .23 for 
independent practice. These ICCs represent low stability, indicating that rates of practice 
opportunities differed across observations. The stability ICC for the QEMI was .32. 
Statistical Analysis 

For our first research question, we examined the effects of the 2:1 versus 5:1 condition on 
student outcomes using a nested mixed-model (multilevel) time x condition analysis (Murray, 
1998) to account for the intra-class correlation associated with students nested within groups. 
The analysis tested for differences between conditions on gains in outcomes from the fall to 
spring of kindergarten. The statistical model included time, coded 0 at pretest and 1 at posttest, 
condition, coded 0 for 5:1 ROOTS groups and | for 2:1 ROOTS groups, and the interaction 
between the two. A mixed analysis of covariance model was used for the SESAT measured only 
at posttest. Our second research question examined whether 2:1 and 5:1 ROOTS groups 
experienced different rates of student practice opportunities or QEMI ratings using independent 
samples t-tests. 

Because students were randomly assigned within classrooms, we tested an additional set 
of mixed-models that extended those discussed above to account for clustering within 
classrooms. Results were similar in both sets of models and condition effects did not vary by 
classroom, so we omitted these results. 

Model estimation. We fit models to our data with SAS PROC MIXED version 9.2 (SAS 
Institute Inc., 2009) using restricted maximum likelihood (REML), generally recommended for 
multilevel models (Hox, 2002). Maximum likelihood estimation for the time x condition analysis 
uses of all available data to provide potentially unbiased results even in the face of substantial 


attrition, provided the missing data were missing at random (Graham, 2009). We did not believe 
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that attrition or other missing data represented a meaningful departure from the missing at 
random assumption, meaning that missing data did not likely depend on unobserved 
determinants of the outcomes of interest (Little & Rubin, 2002). 

The models assume independent and normally distributed observations. We addressed the 
first, more important assumption (Van Belle, 2008) by explicitly modeling the multilevel nature 
of the data. The data in the present study also did not markedly deviate from normality; skewness 
and kurtosis fell with + 2.0 for all measures except for oral counting, where kurtosis was 2.9. 
Nonetheless, multilevel regression methods have also been found quite robust to violations of 
normality (e.g., Hannan & Murray, 1996). 

Effect sizes. To further interpretation, we computed Hedges’ g (Hedges, 1981) for each 
fixed effect as recommended by the What Works Clearinghouse (2014). Assuming ICCs from .1 
to .2, approximately 65 groups per condition, an average of 3.5 students per group, and pre-post 
correlations of .50 and .71, the minimally detectable effect sizes (g) ranged from 0.23 to 0.32. 

Results 

Table 1 presents means, standard deviations, and sample sizes for the six dependent 
variables by assessment time and condition. Below we present results from tests of bias due to 
attrition, effects of the 2:1 versus 5:1 conditions on student outcomes, and differential rates of 
student practice opportunities and QEMI ratings between 2:1 and 5:1 conditions. 

Attrition 

Student attrition was defined as students with data at pretest but missing data at posttest. 
Attrition rates were between 7% and 9% for all outcomes measured at posttest. Only 6% of 
students were missing all posttest data. The proportion of students missing all posttest data did 


not differ between 2:1 and 5:1 conditions (y2”1) = 3.03, p = .082). Although differential rates of 
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attrition are undesirable, differential scores on math tests present a far greater threat to internal 
validity, so we conducted an analysis to test whether student math scores were differentially 
affected by attrition across conditions. We examined the effects of condition, attrition status, and 
their interaction on pretest scores. We found no statistically significant interactions or evidence 
that math scores were differentially affected by attrition across conditions (p’s > .31). 
Impact on Student Outcomes 

Table 2 presents the results of the statistical models comparing gains between 2:1 and 5:1 
ROOTS groups. The models in Table 4 tested fixed effects for differences between conditions at 
pretest (2:1 ROOTS group effect), gains across time, and the interaction between the two. We 
found no statistically significant differences at pretest (p’s > .72), which suggested that students 
were similar in the fall of kindergarten. We also found no statistically significant differences by 
condition in gains from fall to spring (p’s > .25). The time x condition model estimated 
differences in gains between conditions of -0.04 for the NSB (Hedges’ g = -0.01), 1.49 for the 
ASPENS (g = 0.04), 0.03 for oral counting (g < 0.01), 0.70 for the TEMA-3 standard score (g = 
0.10), and -0.03 for the RAENS (g = -0.01). The analysis of covariance model estimated 
differences between 2:1 and 5:1 ROOTS groups of 1.01 for the SESAT (g = 0.03, p = .725). 
Impact on Student Practice and Quality of Explicit Math Instruction 

Table 3 presents descriptive statistics for the observed rates per minute of student practice 
opportunities and QEMI ratings as well as results of independent samples f-tests comparing these 
outcomes by condition. Compared to the 5:1 ROOTS groups, 2:1 groups experienced higher 
rates of individual practice (t = 2.95, p = .004, g = 0.51) and lower rates of group practice (t = - 
2.12, p = .036, g = -0.36). We found no effects of condition on the rate of guided practice (t = 


1.15, p = .254, g = 0.20), independent practice (t = 0.80, p = .424, g = 0.14), or all practice 
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combined (t= 1.45, p = .150, g = 0.24). With respect to QEMI ratings, 2:1 ROOTS groups had 
higher quality individual practice opportunities compared to 5:1 groups (t = 2.19, p = .031, g = 
0.37). We observed no effects of condition on other QEMI items or the QEMI total score (p’s > 
.13). 
< Table 3 here > 
Discussion 

The purpose of this study was to examine whether an experimental manipulation of group 
size affected the overall intervention impact and treatment intensity of a Tier 2 mathematics 
intervention. The frequency of student practice opportunities and the quality of explicit 
mathematics instruction served as metrics of the intervention’s treatment intensity. The study 
investigated three research questions. 

Results from our first research question suggested no statistically significant differences 
in student mathematics outcomes when comparing the 2:1 and 5:1 ROOTS groups. Essentially, 
students in the 2:1 and 5:1 ROOTS groups demonstrated comparable performances on the six 
mathematics outcome measures. For our second research question, we found that the 2:1 and 5:1 
ROOTS groups facilitated similarly high rates of guided and independent practice opportunities. 
Our findings also showed that the ROOTS groups differed on how frequently they facilitated 
group and individual practice opportunities. Specifically, observation data revealed that the 
highest rates of individual practice opportunities were documented in the 2:1 ROOTS groups, 
and that the 5:1 ROOTS groups engaged students in more group-level practice. These data 
suggest that students in the 2:1 and 5:1 groups received intensive learning experiences. 

Collectively, findings from the current study’s first two research questions replicated 


those reported in Author et al. (2017). As juxtaposed in Table 4, comparable effect sizes were 
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reported in both studies. Moreover, all effect sizes from the current study fell within 95% 
confidence intervals of the study conducted by Author et al. (2017). Because replication is a 
fundamental principle of scientific research (Coyne et al., 2016; Feuer, Towne, & Shavelson, 
2002; Gottfredson et al., 2015; Valentine et al., 2011), we contend that establishing the 
generalizability of our findings across participants and classrooms from a different geographical 
region is important not only for our program of research but also for the field at large. Given that 
the students in the 5:1 ROOTS groups from Author et al. (2017) and the current study performed 
commensurately relative to their peers in the 2:1 ROOTS groups, it helps us build a convergence 
of evidence in support of delivering the ROOTS intervention in conventionally-sized small 
groups (i.e., five students). Perhaps as important, we believe our findings have potential 
implications for the allocation of resources in today’s schools in terms of delivering Tier 2 
mathematics instruction to students at risk for persistent difficulties in mathematics. As schools 
across the nation continue to face financial shortages, they are constantly searching for ways to 
“do more with less,” particularly in terms of human capital. Our results, while preliminary, 
suggest that schools may be able to use fewer interventionists to intervene with more at-risk 
kindergarten students at one time. 
< Table 4 here > 

To extend our work on treatment intensity, we also examined whether the quality of 
explicit mathematics instruction varied by group size. Such instructional quality data were not 
investigated in Author et al. (2017). Results from our third research questions indicated that 
groups did not differ on overall quality. However, statistically significant differences were found 


in terms of the quality of individual practice opportunities. Findings suggested that the observers 
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found the individual practice opportunities facilitated in the ROOTS 2:1 groups were richer and 
more meaningful than those observed in the larger ROOTS groups. 
Implications for Research and Practice 

One implication that arises from this study is that we see value in researchers expanding 
the extant literature base on the topics of group size and treatment intensity in mathematics 
intervention research. To our knowledge, few studies have concurrently investigated these two 
highly important variables in context of early mathematics interventions. Establishing the 
optimal size of instructional groups could provide schools with important information on how to 
best intensify learning opportunities for students with MD. Future research is therefore warranted 
in this area. 

Additionally, our research may shed light on the possibility of a “threshold effect” of 
student practice. While it was hypothesized that the 2:1 groups would outperform the 5:1 groups, 
both group sizes were expected to provide students with intensive learning experiences based on 
the instructional design of the ROOTS intervention. Therefore, the potential yield of additional 
practice in the smaller groups may have diminished after a certain rate or threshold was obtained 
during instruction. Given the possibility of threshold effects, future research is needed to 
establish optimal rates of student practice opportunities for teachers to provide when teaching 
students with or at risk for MD. 

Another potential implication stems from the notion of peer learning (Fuchs & Fuchs, 
1998). While not formally measured, students in the 5:1 ROOTS groups may have had more 
opportunities to learn from their peers, which, in turn, provided an added value to the overall 
treatment effect for these groups. For example, when using concrete materials, students in the 5:1 


groups may have been able to observe more vividly what they were expected to do during a 
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mathematical task or activity. It may have been a similar situation with student mathematics 
verbalizations. Students in the 5:1 groups may have benefited from hearing a wider range of 
mathematical thinking. Future work should explore the role peer learning opportunities have in 
increasing the overall impact and treatment intensity of small-group mathematics interventions. 

Relatedly, while our work in the area of treatment intensity has focused extensively on 
the frequency and quality of student practice opportunities, a logical next step for future research 
might be to apply a treatment intensity framework, such as the one proposed by Warren et al. 
(2007). Utilizing the Warren et al. framework might offer a more comprehensive way to measure 
the treatment intensity of the ROOTS intervention. For example, in a future efficacy trial, 
application of their framework would enable us to compare the cumulative intervention intensity 
of ROOTS to a different Tier 2 intervention that represents the counterfactual condition. 

Finally, we see practical value in our work on student practice opportunities. Research 
across a variety of disciplinary fields, including music and sports (Ericsson, Roring, & 
Nandagopal, 2007), neuroscience (Field, 2005), as well as cognitive and educational psychology 
(Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013) has begun to shed light on the 
importance of practice. In early mathematics, practice is essential for building mathematical 
proficiency among the full range of learners, including students with MD. As shown in the 
current study, students in the 5:1 and 2:1 groups received frequent opportunities to practice with 
the critical concepts and skills of whole number and operations. We therefore encourage teachers 
to facilitate frequent, meaningful practice opportunities when working with students with MD. 
Limitations 

A number of limitations must be considered when interpreting our results. First, each 


ROOTS group was observed only three times. While this was primarily driven by resource 
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constraints in the larger efficacy trial, more observations would likely permit a deeper 
understanding of the treatment intensity of ROOTS. Another possible limitation was that our 
replication study did not consider other potential variables of treatment intensity. For example, 
examining the duration or complexity of the targeted student practice opportunities may provide 
further insight into the treatment intensity of the 2:1 and 5:1 ROOTS groups. Relatedly, the 
current study included the quality of explicit instruction as a metric of treatment and results 
suggested that quality of individual practice opportunities was higher in the 2:1 ROOTS groups. 
While blind to our research hypotheses, it is plausible observers may have been partial to the 
smaller groups thus impacting the quality ratings. Also, the same group of researchers carried out 
the current study and the initial RCT. Author overlap can introduce bias in replication research 
(Coyne et al., 2016). We contend, however, that the likelihood of this type of bias was largely 
controlled for through the inclusion of an external independent evaluator. Finally, this study 
focused specifically on the ROOTS intervention. Therefore, future research is warranted to 
determine whether our findings replicate with other Tier 2 mathematics interventions. 
Conclusion 

Building a converging knowledge base of effective mathematics instruction is paramount 
to supporting the development of mathematical proficiency for students with MD. One way to 
help crystalize the mathematics intervention literature is to not only establish the efficacy of 
mathematics interventions but also examine alterable variables, such as group size, that are 
hypothesized to increase their treatment intensity. Investigations that employ this type of dual 
focus, such as the current study, have the potential to contribute to the knowledge base of 


effective mathematics instruction for students with intensive learning needs in mathematics. 
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Table 1 


Descriptive Statistics for Student Variables by Assessment Time and Condition 


Fall of Kindergarten Spring of Kindergarten 


Measure 2:1 ROOTS 5:1 ROOTS 2:1 ROOTS 5:1 ROOTS 


Demographics M (SD) or % M (SD) or % 
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Fall of Kindergarten Spring of Kindergarten 

Measure 2:1 ROOTS 5:1 ROOTS 2:1 ROOTS 5:1 ROOTS 
Age at pretest 5.3 (0.5) 5.3 (0.4) 
Male 45% 47% 
Race 

Asian 1% 2% 

Black 7% 6% 

White 59% 59% 

an than one 1% 1% 

Unknown 33% 31% 
Hispanic 50% 44% 
LEP 18% 22% 
SPED eligible 8% 5% 
Outcomes M (SD) n M (SD) n M (SD) n M (SD) n 
NSB 12.3 (3.9) 138 =12.1(3.8) 326 20.2 (4.8) 131 20.0 (4.9) 294 
ASPENS 21.6(16.8) 137 21.1(16.5) 323 94.8 (33.1) 131 92.9(34.1) 294 
Oral Counting 19.5(12.5) 138 19.8 (13.1) 326 47.2(21.0) 131 47.3(21.8) 293 
TEMA-3 16.8 (7.1) 137 16.9(6.9) 323 27.6 (6.6) 134 =. 27.1 (7.6) 300 
RAENS 11.7 (5.7) 137 -11.4(65.7) 324 24.7 (5.6) 134 = 24.6 (5.5) 301 
SESAT Total 473.1 (34.4) 130 465.0 (36.7) 295 


Note. The complete sample included 138 students in the 2:1 ROOTS group condition and 327 students in the 5:1 


ROOTS group condition. The sample sizes (n) represent students with a particular measure at each assessment 


period. LEP = Limited English proficiency; NSB = Number Sense Brief; ASPENS = Assessing Student Proficiency 
in Early Number Sense; TEMA-3 = Test of Early Mathematics Ability (3rd edition); RAENS = ROOTS Assessment 
of Early Numeracy Skills; SESAT = Stanford Early School Achievement Test. 
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Table 2 


30 


Results from a Nested Time x Condition Analyses on Fall-to-Spring Gains in Math Comparing 
2:1 and 5:1 ROOTS Groups 


Oral 
NSB ASPENS Counting TEMA-3 RAENS 
Fixed Intercept 12.17*** = 21.20*** 19.75*** 16.95*** 11.48*** 
Effects (0.35) (1.94) (1.20) (0.58) (0.44) 
Time 7.85 *** 71.83*** 27.68*** 10,03*** 13.08*** 
(0.28) (1.92) (1.44) (0.36) (0.36) 
ROOTS group 0.19 0.20 -0.21 -0.06 0.25 
(0.55) (3.13) (2.00) (0.90) (0.70) 
Time x ROOTS group -0.04 1.49 0.03 0.70 -0.03 
(0.48) (3.25) (2.39) (0.61) (0.60) 
Variances ROOTS group 4.76% ** 99.48 ** 13.04 13.05*** 6.75 *** 
intercept (1.04) (30.41) (11.98) (2.91) (1.66) 
ROOTS group gains 0.71 36.90 28.15* 1.60* 1.46* 
(0.46) (20.29) (12.18) (0.78) (0.72) 
Student A53*** — 167.50***  — 83.33 *** 23.82*** 10.49%** 
(0.80) (32.04) (15.66) (2.42) (1.44) 
Residual 8.83*** — 384.28*** = 184.708** 12. 55*** 12.79*** 
(0.71) (30.36) (15.13) (1.03) (1.03) 
Hedges'g Time xX ROOTS group -0.01 0.04 < 0.01 0.10 -0.01 
p-values Time x ROOTS group —_.9334 .6480 .9905 2515 .9629 
df Time x ROOTS group 173 175 148 154 167 


Note. Table entries show parameter estimates with standard errors in parentheses except for Hedges’ g values, p- 
values, and the degrees of freedom (df). Tests of fixed effects (first four rows) accounted for small groups as the unit 
of analysis within the 2:1 and 5:1 ROOTS conditions. NSB = Number Sense Brief; ASPENS = Assessing Student 

Proficiency in Early Number Sense; TEMA-3 = Test of Early Mathematics Ability (3ra edition); RAENS = ROOTS 
Assessment of Early Numeracy Skills. 


*p < 05. *p < .O1.**p < .001. 
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Table 3 


Results of Independent Samples t-tests Comparing Rates of Student Practice, Quality of Explicit 
Instruction, and Fidelity of Implementation by Size of ROOTS Group 


2:1ROOTS = 5:1 ROOTS 


Groups, M@ Groups, M t Medes 
(SD) (SD) 2 
Rates of student practice opportunities 
Guided practice 0.9 (0.5) 0.8 (0.4) 115.254 0.20 
Independent practice 3.0 (0.7) 2.9 (0.7) 0.80 424 0.14 
Individual practice 2.2 (0.9) 1.8 (0.7) 2.95 .004 0.51 
Group practice 1.7 (0.7) 1.9 (0.5) -2.12  .036 -0.36 
All practice 3.8 (0.8) 3.7 (0.7) 1.45 = .150 0.24 
Quality of explicit math instruction (QEMI) 
Efficient delivery of instruction 3.2 (0.6) 3.1 (0.6) 0.92 .361 0.15 
Student participation 3.2 (0.5) 3.1 (0.5) 1.23 .221 0.21 
Effective teacher modeling 3.2 (0.5) 3.2 (0.5) 0.45  .625 0.08 
Group practice opportunities 3.1 (0.5) 3.2 (0.5) -0.39 = .698 -0.06 
Checks of understanding 3.3 (0.5) 3.2 (0.5) 1.05 .297 0.16 
Individual practice opportunities 3.3 (0.5) 3.1 (0.5) 2.19 .031 0.37 
Instructional scaffolding 3.2 (0.5) 3.1 (0.6) 1.53 128 0.26 
Total QEMI score 3.2 (0.4) 3.1 (0.5) 1.14 .255 0.20 
Fidelity of implementation 
1. Number of activities taught out of 5 4.2 (0.4) 4.2 (0.4) -0.04 .972 0.00 
2. Met math objectives 3.6 (0.5) 3.5 (0.5) 0.97 336 0.15 
3. Followed teacher scripting 3.5 (0.5) 3.4 (0.5) 1.33 .185 0.24 
4. Used prescribed math models 3.7 (0.4) 3.6 (0.5) 1.34 ~=.183 0.23 
Total fidelity 3.6 (0.4) 3.5 (0.5) 1.29 199 0.21 
Average observation duration in minutes 20.7 (3.7) 22.9(4.1) -3.30  .001 -0.57 


Note. M = mean, SD = standard deviation. Group f tests were based on 69 2:1 ROOTS groups and 67 5:1 ROOTS 
groups (134 degrees of freedom). Quality of explicit math instruction was rated from 1 = not present to 4 = highly 
present. Total instructional quality was calculated as the mean across items. Fidelity of implementation items 2 
through 4 were rated from | = none to 4 = all. Total fidelity was calculated as the mean across items 2 through 4. 
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Table 4 


Published and Replicated Effect Sizes from the Time x Condition Analyses of Fall-to-Spring 
Gains in Math Comparing 2:1 and 5:1 ROOTS Groups 


Outcomes Author et al. (2017) Author et al. (under review) 
NSB 0.00 [-0.20, 0.20] -0.01 [-0.20, 0.19] 
ASPENS -0.14 [-0.32, 0.05] 0.04 [-0.14, 0.23] 
Oral Counting 0.08 [-0.12, 0.29] 0.00 [-0.22, 0.22] 
TEMA-3 -0.01 [-0.17, 0.15] 0.10 [-0.07, 0.26] 
RAENS 0.03 [-0.17, 0.22] -0.01 [-0.22, 0.21] 
SESAT Total 0.03 [-0.14, 0.21] 0.03 [-0.13, 0.19] 


Note. Table entries show Hedges’ g effect size estimates with 95% confidence intervals in brackets. NSB = Number 
Sense Brief; ASPENS = Assessing Student Proficiency in Early Number Sense; TEMA-3 = Test of Early 
Mathematics Ability (3ra edition); RAENS = ROOTS Assessment of Early Numeracy Skills. 


